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This note will introduce some notation and definitions for information theoretic quantities in the 
context of quantum systems, such as (conditional) entropy and (conditional) mutual information. 
We will employ the natural C*-algebra formalism, and it turns out that one has an allover dualism 
of language: we can define everything for (compatible) observables, but also for (compatible) C*- 
subalgebras. The two approaches are unified in the formalism of quantum operations, and they are 
connected by a very satisfying inequality, generalizing the well known Holevo bound. Then we turn 
to communication via (discrete memoryless) quantum channels: we formulate the Fano inequality, 
bound the capacity region of quantum multiway channels, and comment on the quantum broadcast 
channel. 



I. INTRODUCTION 

After the beginnings of quantum information theory in the sixties [jjj, and Holevo's now widely known investigations 
of the seventies today there is again a tremendous interest in this field. This interest focuses on two areas which 
may be described, sightly abusing language introduced by Holevo twenty years ago ||, as classical- quantum problems 
on the one hand, and quantum- quantum problems on the other, and it mostly derives from the latter, as these include 
all problems of (quantum) information processing inside a quantum computer or memory. Whereas this area (which 
is charcterized by its attention to entanglement) poses many new and beautiful, and also very difficult problems, the 
present note is concerned wholly with the former area (though it is by now not altogether clear how to separate these 
two worlds, cf. e.g. opinion uttered by Adami and Cerf We take the view that classical-quantum problems are 
those in which classical information has to be stored in or sent trough some quantum system. Examples from recent 
work are the determination of the quantum channel capacity for fixed input states |3]-^|, quantum cryptographic 
protocols ||[l0]], and entanglement enhanced transmission (superdense coding) [ jO] . 

Our approach is somewhat reminiscent of "quantum probability" through its formulation in terms of C*-algebras and 
its emphasis on observable operators (which reflects our dwelling in the classical-quantum area) , but we cannot respect 
the bounds of this field: we will use positive operator valued measures (instead of unbounded selfadjoint operators), 
and we will consider quantum operations, both quite uncommon in noncommutative probability. Finally it should 
be noted that we hardly present any new concepts or results — our contribution lies in introducing a reasonable and 
efficient calculus. 

The outline of the paper is as follows: in section || we will basically recall the language of C*-algebras, completely 



positive maps, positive operator valued measures, and the notion of compatibility. In the following sections III 
and |l^ wc will define various information theoretic quantities, first for observables, second for *-subalgebras. In 
section |y| we will unify these approaches using completely positive C*-algebra maps, and can give meaning to s ome 



hybrid expressions in section VI. The observable and subalgebra notions will be brought together in section VII 
where we prove an information inequality in generalization of the Holevo bound. Up to this point the work consists 
in the definition of concepts and information theoretic quantities, and proving some simple numerical relations. 



The last section VIII will discuss the application of these concepts to quantum channels, stating a Fano inequality, 
and determining a bound on the capacity region of the quantum multiway channel. We conclude by making some 
observations for the quantum broadcast channel. 

About notation: finite sets will be denoted A,B,..., the functions exp and log are always to basis 2. 



*Electronic address: winter@mathematik.uni-bielefeld.de 



1 



II. MATHEMATICAL DESCRIPTION OF QUANTUM SYSTEMS 



In classical probability theory one has generally two ways of seeing things: either through distributions (and 
the relation of their images, mostly marginals), or through random variables (with a common distribution). Both 
ways have their merits (though random variables are considered more elegant), but basically they are equivalent, 
in particular none lacks anything without the other. Things are different in quantum probability, and we will take 
the following view: the analog of a distribution is a density operator on some complex Hilbert space, whereas the 
analog of random variables are observables, defined below. With density operators alone we can study physical 
processes transforming them, but every experiment involves some observable. Studying observables one usually fixes 
the underlying density operator (as the statistics of the experiments depend on the latter), but this falls short of not 
appropriately reflecting our manipulating quantum states, or having several alternative states. 

For the following we refer to textbooks on C*-algebras like Arveson jlj], Dixmier [Q, and standard references on 
basic mathematics of quantum mechanics: Davies jlj], Kraus fig] , and the more advanced jl6| by Holevo. 

A. Systems and their states 

A C*-algebra with unit is a Banach space 21 which is also a C-algebra with unit 1, and a C-antilinear involution 
*, such that 

\\AB\\ < \\A\\\\B\\, \\A*\\ 2 = \\A\\ 2 = \\AA*\\ 

These algebras will be the mathematical models for quantum systems, and subsystems are simply *-subalgebras. 
The set 21 + of A e 21 that can be written as A = BB* is called the positive cone of 21 which is norm closed, and induces 
a partial order <. By the famous Gelfand-Naimark-Segal representation theorem (see e.g. ]l^]) every C*-algebra is 
isomorphic to a closed *-subalgebra of some £(7i), the algebra of bounded linear operators on the Hilbert space H. In 
this note all C*-algebras will be of finite dimension. It is known that those algebras are isomorphic to a direct sum of 
£(Hi) (see e.g. Arveson |l2[)J^] This includes as extremal cases the algebras £(H), and the commutative algebras CX 
over a finite set X . In particular we have on every such algebra a well defined and unique trace functional, denoted 
Tr , that assigns trace one to all minimal positive idempotents. 

A state on a C*-algebra 21 is a positive C-linear functional p with p(l) = 1. Positivity here means that its values 
on the positive cone are nonnegative. Clearly the states form a convex set 6(21) whose extreme points are called 
pure states, all others are mixed. One can easily see that every state p can be represented uniquely in the form 
p(X) = Tr (pX) for a positive, selfadjoint element p of 21 with trace one (such elements are called density operators). 
In general this is only true for so-called normal states, which means that for an increasing sequence A n converging in 
norm to A the values p{A n ) converge to p(A). In the sequel we will therefore make no distinction between p and its 
density operator p. The set of operators with finite trace will be denoted 21* , the trace class in 21 which contains the 
states and is a two-sided ideal in 21, the SCHATTEN-ideal |l8). Tr (pA) then defines a real bilinear and nondegenerate 
pairing of 2l* s and 2l s , the selfadjoint parts of 21* and 21 which makes 2l s the dual of 2l* s . Notice that in this sense 
pure states are equivalently described as minimal selfadjoint idempotents of 21. 

B. Observables 

Let T be a cr-algebra on some set tt, X a C*-algebra. A map X : T — ► X is called a positive operator valued 
measure (POVM), or an observable, with values in X (or on X), if: 

1. X($) = 0, X(Q) = 1. 



1 It is certainly the case that most of the material presented may be generalized to infinite dimensional algebras (see e.g. 
Ohya/Petz j^|). We decided not to try for several reasons: one is that in information theory the interesting things already 
happen in the discrete and even finite domain, another (decisive) that the present author is only a stumbling beginner in the 
vast field of C*-algebras. At least it seems clear that the bulk of the things presented here carries over to algebras which are 
isomorphic to countable sums of full (bounded) operator algebras of separable Hilbert spaces: there we have trace, well behaved 
tensor products, and the Schatten decomposition (diagonalization) of density operators. 
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2. E C F implies X(E) < X(F). 



3. If (E n ) n is a countable family of pairwise disjoint sets in T then X({J n E n ) = 'Yl in X(E n ) (in general the 
convergence is to be understood in the weak topology: for every state its value at the left equals the limit value 
at the right hand side). 

If the values of the observable are all projection operators and O is the real line one speaks of a spectral measure or a 
von Neumann observable^ An observable X together with a state p yields a probability measure P x on fl via 

P x {E)=Tv(pX{E)) 

In this way we may view X as a random variable with values in X, its distribution we denote Px (note that Px may 
not be isomorphic to P x : if X takes the same value on disjoint events, which means that X introduces randomness 
by itself). 

Two observables X , Y are said to be compatible, if they have values in the same algebra and XY = Y X elementwise, 
i.e. for all E e T x , F £ Ty. X(E)Y(F) = Y(F)X(E) (Note that it is possible for an observable not to be 
compatible with itself). By the way, the term compatible may be defined in obvious manner for arbitrary sets or 
collections of operators, in which meaning we will use it in the sequel. If X, Y are compatible we may define their 
joint observable XY : Tx x -Fy — ► X mapping E x F to X (E)Y (F) (this defines the product mapping uniquely just 
as in the classical case of product measures). In fact we can analogously define the joint observable for any collection 
of pairwise compatible observables As the random variable of a product XY we will take X x Y, rather than XY 
itself, with values in X x X (because the same product operator may be generated in two different ways which we 
want to distinguish). To indicate this difference we will sometimes write X ■ Y for the product. 

Note that here we can see the reason why we cannot just consider all observables as random variables (and forget 
about the state): they will not have a joint distribution, at first of course only by our definition. But Bell's theorem [19] 
shows that one comes into serious trouble if one tries to allow a joint distribution for noncompatible observables. 
Conversely we see why we cannot do without observables, even though p contains all possible information: the crux is 
that we cannot access it due to the forbidden noncompatibel observables (a good account of this aspect of quantum 
theory is in p0[). 

From now on all observables will be countable, i.e. w.l.o.g. are they defined on a countable ft with a-algebra 2 n . 
This means that we may view an observable A as a resolution of 1 into a countable sum 1 = X^eo °f positive 
operators Xj. 

If 2ti, 21 2 are subalgebras of 21, they are compatible if they commute elementwise (again note, that a subalgebra need 
not not be compatible with itself: in fact it is iff it is commutative) . In this case the closed subalgebra generated (in 
fact: spanned) by the products A1A2, Ai <G 21^ is denoted 2li2l2- 



C. Quantum operations 

Now we describe the transformations between quantum systems: a C-linear map tp : 2I2 — > 2li is called a quantum 
operation if it is completely positive (i.e. positive, so that positive elements have positive images, and also the ip <8)id„ 
are positive, where id„ is the identity on the algebra of n x n-matrices), and unit preserving. These maps are in 
1-1 correspondence with their (pre-)adjoints p* by the trace form, mapping states to states, and being completely 
positive and trace preserving.^] Since here we restrict ourselves to finite dimensional algebras the adjoint map simply 
goes from 21 1 to 2I2, but to keep things well separated (which they actually are in the infinite case) we write the 
adjoint as p* : 2li* — * 2I2*, the dual map (in fact we consider this as the primary object and the operator maps as 
their adjoint, which is the reason for writing subscript *). Notice that tp* is sometimes considered as restricted to 
</?* : 6(2li) — > 6(2(2). A characterization of quantum operations is by the Stinespring dilation theorem pjj ]: 



2 Strictly speaking this term only applies to the expectation of the measure (in general an unbounded operator), but this in 
turn by the spectral theorem determines the measure. 

3 Observe however that in general a joint observable might exist for non-compatible (i.e. non-commuting) observables. The 
operational meaning of this is that there is a common refinement of the involved observables. If they commute then this 
certainely is possible as demonstrated, but commutativity is not necessary. 

4 In general this is only true if we restrict ip to be a normal map, see Davies fl4| . 
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Theorem 1 (Dilation) Let p : 21 — > £(Tl) a linear map of C* -algebras. Then ip is completely positive if and only if 
there exist a representation a : 21 — > £(/C), Hilbert space K,, and a bounded linear map V :TL — > /C suc/i i/iat 

VA e 21 <p(A) = V"a(A)^ 

For proof see e.g. H]. 



D. Entropy and divergence 

We will talk about information theory, so we need a concept of entropy: the von Neumann entropy H(p) = 
— Tr (plogp) (introduced in [p2[) of a state p (which reduces to the usual Shannon entropy for a commutative algebra 
because then a state is nothing but a probability distribution) . For states p, a also introduce the I-divergence (first 
defined by Umegaki ^3|), or simply divergence as D(p\\a) — Tr (p(\ogp — logo - )) with the convention that this is oo 
if supp p supper (supp p being the support of p, the minimal selfadjoint idempotent p with ppp — p). For properties 
of these quantities we will often refer to [[l7| , and to p3 . Two important facts we will use are 

Theorem 2 (Klein inequality) For positive operators p,o~ (not necessary states) 

D(p\\a) > ±Tr (p - a) 2 + Tr(p- a) 
In particular for states the divergence is nonnegative. 

Proof. Sec 0. □ 

Theorem 3 (Monotonicity) Let p, a be states on a C* -algebra 21, and ip* a trace preserving, completely positive 
linear map from states on 21 to states on 03 . Then 

D(p*p\\p*a) < D(p\\a) 

Proof. Uhlmann [^5| , the situation we are in was already solved by Lindblad [^6| . For a textbook account see jlTj . □ 



III. OBSERVABLE LANGUAGE 



Fix a state on a C*-algebra, say p on 21 and let X, Y, Z compatible observables on 21. 
By the previous section O these arc then random variables with a joint distribution, and one defines entropy H(X), 
conditional entropy H(X\Y), mutual information I(X AY), and conditional mutual information I(X AY\Z) for these 
observables as the respective quantities for them interpreted as random variables. Note however that these depend 
on the underlying state p. In case of need we will thus add the state as an index, like H p (X) = H(X), etc. 
As things are there is not much to say about that part of the theory. We only note some useful formulas: 



H (X \Y) = ^ (pYj)H Pj (X) , with p 3 = 




(which is an easy calculation using the compatibility of X and Y), and 

I(X AY) = H(X) + H(Y) - H(XY) 

= D{P XY \\P X ® P Y ) = D{P x . Y \\Px ® Py) 
(which is known from classical information theory). 



IV. SUBALGEBRA LANGUAGE 



Let X, 3£i,£2,2) compatible *-subalgebras of the C*-algebra 21, and p a fixed state on 21. 
First consider the inclusion map i : X 21 (which is certainly completely positive) and its adjoint : 21* — > X*. 
Define 

H(X) = H p (X) := H(i*p) 
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(where at the right hand appears the von Neumann entropy). For example for X = 21 we obtain just the von Neumann 
entropy of p. For the trivial subalgebra C = CI (which commutes obviously with every subalgebra) we obtain, as 
expected, H(C) = 0. The general philosophy behind this definition is that H(X) is the (von Neumann) entropy of 
the global state viewed through (or restricted to) the subsystem X. To reflect this in the notation we define p\x — i*P- 
Now conditional entropy, mutual information, and conditional mutual information are defined by reducing them to 
entropy quantities: 

H{3tf&) = H{3S&)-HQ&) 



Z(£i A X 2 ) = H{X{) + H(X 2 ) - ff(£iX 2 ) 

7(Xi A X 2 |2)) = H(Ii|2J) + H{X 2 \Z)) - H{X 1 X 2 \Tl) 

= H{X^) + i/(X 2 2J) - H{X 1 X 2 ^Q) - H{fQ) 

It is not at all clear a priori that these definitions are all well behaved: while it is obvious from the definition that the 
entropy is always nonnegative, this is not true for the conditional entropy (as was observed by several authors before): 
if 21 = X ® 2) and p is a pure entangled state then H(X\%)) = —H (2)) < 0. This might raise pessimism whether the 
other two quantities also are (at least sometimes) pathological. This they are not, as will be shown in a moment: 
We have the following commutative diagram of inclusions, and the natural multiplication map p (which is in fact a 
*-algebra homomorphism, and thus completely positive!): 



Xi 



Xi (g) x 2 



x 2 



XxX 2 



x 2 



Xi 



21 



x 2 



And hence the corresponding commutative diagram of adjoint maps (note that (fx* and ip 2 * are just partial traces). 
With this we find 

/(Xi A X 2 ) = H{Xi) + H(X 2 ) - H(XxX 2 ) 
= H{jup) + H(j 2 *p) - H(j*p) 
= H((pi*p,*j*p) + H(ip 2 *p,*j*p) - H(p*j*p) 
= D(p*]*p\\(pi*p*j*p<g> tp 2 *p*j*p) 

by definition, then by commutativity of the diagram and the fact that p* preserves eigenvalues of density operators 
(because p, is a surjective *-homomorphism, see lemma [j] below), the last by direct calculation on the tensor product 
(just as for the classical formula). From the last line we see that the mutual information is nonnegative because the 
divergence is, by theorem 0.0 (we could also have seen this already from the definition by applying subadditivity of 
von Neumann entropy to the second last line, see theorem VII. l]). 



Lemma 1 Let p : 21 — ► S a surjective * -algebra homomorphism. Then 

1. For all pure states p £ 6(21): p{p) pure or 0. 

2. For allAe^i,A> 0: Tr A > Tr p{A). 

3. For pure p € ©(21), q G 6(58). ■ 

=P or p{p) = 0, p(p4p(p))) = p(p), p(p*(q)) = q 

4- For p G 6(58), p*{p) — ^2i&iPi diagonalization with the on > 0, then p — * s a diagonalization. 

5. Conversely every diagonalization of a state on 58 is by p,* translated into a diagonalization of its p* -image. 



Proof. 
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1. We have only to show that n(p) is minimal if it is not 0: let q' any pure state with q' < /«(p). Then 

1 = Tr (q'»(p)) = Tr (/i*(<?')p) < Tr (p) = 1 

So we must have equality which implies p < [J,*(q'), but both operators are states, so p = fJ,*(q'). Because p* is 
injective this means that there is only one pure state q' < p(p), i.e. fi(p) is pure. 

2. We may write A = £\ aiPi with pure states pt and <ij > 0. Then n(A) = J2i a i^(Pi) an d since pure states have 
trace 1 the assertion follows from (1). 

3. Let A G 21, A > 0. Then 

Tr (p*(p(p)M) = Tr (m(p)m(^)) = Tr (p(p)p(v4)p(p)) 
= Tr (/i(pip)) < Tr (pAp) = Tr (pA) 

Thus p*(p(p)) < p. If p(p) 7^ it is a pure state, hence p*(p(p)) a state which forces p,*(p(p)) = p. This proves 
the left formula, the middle follows immediately, and for the right observe that we may choose a pure pre-image 
p of q (in fact that will be (i*(q), as one can see from (4)). 

4. ai/i(pi) is certainly the diagonalization of some positive operator since the p(pi) which are not are by the 
homomorphism property and by (1) pairwise orthogonal pure states. Now observe p(p*(p)) = ctifi(pi) and 

hence equality, i.e. all /u(pj) are pure. From 

and injectivity of p* the assertion follows. 

5. This is a direct consequence of (3) and (4). □ 

For the conditional mutual information we have to do somewhat more (yet from the definition w e see that its positivity 
will have something to do with the strong subadditivity of von Neumann entropy, see theorem VII p : 
Consider the following commuative diagram: 

2J — £^-> Xi®2J — ^ Xi?J 

J* 

2) — Xi®X 2 ®2) — ^— ► XiX 2 2J — ^— ► 21 

2} y2 > X 2 ®2J "* > X 2 2J 

All maps there are completely positive, p, pi,p2 being *-homomorphisms. Thus the adjoints of the various tp's are 
partial traces and with a = p*j*p: #(XiX 2 2J) = H(a), H(Xi%)) = H(Tr X2 a), H(X 2 2J) = ff(Tr Xl a), H(fQ) = 
H (Tr x i (giX2 cr ) (where we have made use of lemma |l| several times), and we can indeed apply strong subadditivity. 
Finally let us remark the nice formulas 

H(X) = H(X\C), I(X 1 AX 2 )=I(X 1 AX 2 \C) 



Example 2 A very important special case of the definitions of this and the preceding section occurs for tensor products 
of Hilbert spaces 2(Hi ® H2) — £(Hi) <8> £(H 2 ), or more generally tensor products of C*-algebras: 21 = 2ti ® 2l 2 . 
2ti, 21 2 are *-subalgebras of 21 in the natural way, and are obviously compatible. The same then holds for observables 
Aj C 21^, and similarly for more than 2 factors. In this case the restriction p\<& i is just a partial trace. 
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V. COMMON TONGUE 



The languages of the two preceding sections may be phrased in a unified formalism (the "common tongue" ) using 
completely positive C*-algebra maps (in particular those from or to commutative algebras, inclusion maps, and *— 
algebra homomorphisms, cf. Stinespring fell ). 

That this is promising one can see from the observation that observables can be interpreted in a natural way as 
C*-algebra maps: X : fi — > 21 corresponds by linear extension to X : 58(0) — > 21, where 58(0) = 33 (f2, T) is the 
algebra of bounded measurable functions on f2. We follow the convention that in this algebra j G Q shall denote the 
function that is 1 on j and elsewhere, so X(j) = Xj, and obviously X*(p) equals the distribution P x on £1 induced 
by X with p. 

Let us also introduce some notation for the observable X: the total observable operation X tot 
^4h> yjY'jA^jY'j, its interior part X; nt = X tot o i a : 21 — > 21 with A i— > J^. 
X fn f o jm/o'i which coincides with X. 



Yj, and its exterior 



_ _33(Q) <g> 21 -> 21 

mapping j ' ,- , ^ 

part X cxt = A tot o ?>8(n) 

Consider compatible quantum operations y> : X — > 21, ^> : 2) — > 21, etc. (<p, ip are compatible if their images commute 
elementwise) . In this case their product is iprp : X ® 2) — ► 21 mapping X ® F i— > <^(X)^/>(Y): 



21 



X^2J 21 



2J 



21 



Note that this generalizes the product of observables, as well as the product map p of subalgebras. 

Now simply define H(cp) = H(ip*p), and again the conditional entropy and the informations are defined by reduction 

to entropy, e.g. H(ip\ip) = H(ipip) - H(tp), or I(<p A tp) = H(<p) + H{ip) - H(ipip). 

For the mutual information observe that (see previous diagram) : 



I(ip Aip) = £)((^)*p|| ip*p ® ip*p) 
= Z3(cr||Tr <gCT <g> Tr £<r) 



with cr = (ipip)*p 



Note the difference to Ohya/Petz |17|: with them the entropy of an operation is related to the mutual information of 
the operation as a channel (see section VII]). With us the entropy of an operation is the entropy of a state "viewed 
through" this operation (as was the idea with the entropy of a subsystem, and obviously also with the entropy of an 
observable) . 



VI. PIDGIN 

With the insight of the preceding section we may now form hybrid expressions involving observables and subalgebras 
at the same time: let i : X 21, j : 2J 21 *-subalgebra inclusions, and X, Y observables on 21, all four compatible. 
Then we have 

H(X\Y) = H(iY) - H(Y) 



I{X AY) = H(i) + H(Y) - H{iY) 

and lots of others. From the previous section we know that the information quantities are nonnegative, but also the 
entropy conditional on an observable, from the formula 



H(X\Y) = ^ (pYj)H Pj (X) , with Pj 



1 



But also again there are some expressions which seem suspicious, like 

H (X|2J) = H(Xj) - H (2J) 
But due to the inequality of theorem VII. 1^ in fact it behaves nicely. 
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VII. INEQUALITIES 



A. Entropy 

Theorem 1 For compatible * -subalgebras 2I3 one has: 

1. Subadditivity: F(2li2l 2 ) < -ET(2Ci) + i?(2l 2 ). 

2. S"£ron<? subadditivity: £T(2li2l 2 2l 3 ) + £T(2t 2 ) < iT(2ti2t 2 ) + £T(2l 2 2l 3 ). 

Proof. Subadditivity is a special case of strong subadditivity: 2l 2 = C. The latter can be reduced to the familiar 
form (see e.g. Wehrl by the same type of argument as we used in section |^ for the nonnegativity of conditional 
mutual information... □ 

Theorem 2 Let X, 2) compatible, p\x%) pure. Then H(X) = H{%)). 

Proof. By retracting the state p to X ® 2) by the multiplication map ^ : X <g> 2) — * X2) (see lemma iy|. j]) we may 



assume that we have a pure state p on X £g> 2). Then the assertion of the theorem is if (Tr xp) = H(Tr tgp) which is 
well known (proof via the polar decomposition of p...). □ 

Another kind of inequality may serve as an operational justification of the definition of von Neumann entropy. Call 
a quantum operation if : 21 1 — > 21 2 doubly stochastic if it preserves the trace, i.e. for all A £ 2li : Tnp(A) = Tr A 
(see Ohya/Petz [l7|| ). We will consider the less restrictive condition Trif(A) < Tr A, and for an observable X and 
subalgebra X let us say they are maximal in 21 if X and the inclusion map have this property (obviously for the 
subalgebra this implies doubly stochastic) . Main examples are: an observable whose atoms are minimal in the target 
algebra, i.e. have only trivial decompositions into positive operators, and a maximal commutative subalgebra. 

Theorem 3 (Entropy increase) Let if : 2) — > X with Tr if {A) = Tr A, and ip : X — > 21 quantum operations. Then 
H(ip o ip) > H(ip). (Notice that in the physical sense the operation if* is applied after ip*). 

Before we prove this let us note two important case of equality: Let p — \pi with mutually orthogonal pure states 
Pit Aj > 0, J2iPi = 1- Then equality holds for the subalgebra generated by the Pi (in fact for any subalgebra which 
contains them), and for the observable that corresponds to the p^s resolution of 1. 

Proof of theorem |^. Let a = if>*p, we have to prove H(tp*a) > H(a). From the previous discussion we see that we 
may assume 2) to be commutative, without changing the trace relation. Let a = oupi a diagonalization with pure 
states pi on X, and qj the family of minimal idempotents of 2) (which by commutativity are othogonal). Then we 
have decompositions if*Pi = J2j Pijqj, hence 



if*o = ^2 ottf*Pi = 2_ I 2^ ' ' ' * ; ' I ''i 



Now observe that for all j 



^ fa = Tr ( qj V <p*pi) = Tr ((wj)Vp,) = Tr (ipqj) < Tr (qj) = 1 



and the result follows from the formulas H(a) = H(cti\i), H(<p*cr) — H{^2 li f3ijai\j). □ 

Let us formulate the special cases of maximal observables and maximal subalgebras as a corollary: 

Corollary 4 Let X an observable maximal in X, then H(X) > H(X). Let X' a subalgebra maximal in X, then 
H(X') > H(X). □ 

An application of this is in the proof of 

Theorem 5 Let X, 2) compatible, p any state. Then \H(X) — i?(2))| < H(X%)). 

Proof. Like in the previous theorem we may assume that p is a state on X<8> 2), and by symmetry we have to prove 
that 

H(X) - ff(2J) < i?(X2J) 

If we think of X and 2) as sums of full operator algebras, say X = ©^ £(Hi), 2) = © ■ then embedding them 

into £(©j Hi), £(0j fCj), respectively, does not change the entropies involved (because the subalgebras are maximal). 
Thus we may assume that X = £(7i), 2) = £(/C). Now consider a purification \tp) of p on the Hilbert space 7i £g>/C(g>£ 
(see e.g. f27|): this means p = Tr s( n\ip)(ip\- Now by theorem | H (X) = #(2)3), H(3S&) = Ji(3), and the assertion 
follows from subadditivity theorem R £T(2)3) < H(<Q) + H{%). □ 
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B. Information 



The follow ing i nequality for mutual information is a straightforward generalization of the Holevo bound , see also 
next section VII] : 



Theorem 6 Let X,Y be compatible observables with values in compatible *-subalgebras X, 2), respectively. Then 

I(X AY) < I{X AY) < I{X A 2)) 

(Conditions of equality!). 
Proof. Consider the diagram 

x ~ „ 



23(r!x) 



<B(n x )®<B(fl Y ) X®2J — a 



23(fty) 



4 



and apply the Lindblad-Uhlmann monotonicity theorem twice, with /x*(p) and the maps (id® Y)* and (Xcgiid)*, 
one after the other. □ 

This can be greatly extended: for example if X C X', 2) C 2)', then 

I(X A 2J) < 7(X' A 2}') 

The most general form is 

I(ipi o (p 1 A ip2 ° ^2) < I(ipi A 1P2) 

in the diagram 



2ti 



2li <g> 2l 2 



21', 



2li 



21 



2li (8>2t 2 



21 



Theorem 7 Let Xi, X 2 , 2)i, 2) 2 compatible *-subalgebras o/2l, p a state on 21. TTien 

7(X 1 X 2 A 2Ji2) 2 ) < /(X x A 2Ji) + I(X 2 A 2J 2 ) 

if A X 2 2) 2 |Xi) = and /(2} 2 A Xi2)i|X 2 ) = (i.e. 2)fc *s independent /rom the other subalgebras conditional on 
Xk). 

Proof. First observe that the conditional independence mentioned, I(2)i A X 2 2)2|Xi) = 0, is equivalent to 
#(2)i|XiX 2 2) 2 ) = #(2)i|Xi). By theorem |l| we then have also H(Z)i\X 1 £ 2 ) = H{ty x \Xx). Now observe (with 
the obvious chain rule) 

i/(2J 1 2) 2 |X 1 X 2 ) = i/(2} 1 |X 1 X 2 2J 2 ) + #(2j2|£iX 2 ) 
= ff(2)i|X 1 ) + i/(2} 2 |X 2 ) 
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and hence 

7(X 1 X 2 A 2h2) 2 ) = i/(2}i2} 2 ) - i/(?)i2) 2 |X 1 X 2 ) 

< 77(2)x) + 77(2) 2 ) - H(2)i|3£i) - 77(2J 2 |X 2 ) 
= 7(Xi A 2h) + 7(X 2 A 2) 2 ) 

where we have used the subadditivity of von Neumann entropy theorem |l|. □ 
The same obviously applies if we have n *-subalgebras X&, and n 2)fe, all compatible, and if 2)fc is independent from 
the others give Xfc, i.e. for all k 

H{W, k \X x • • • £„2h ■■■§ k --- 3) n ) = H{Z) k \X k ) 



Corollary 8 Let Xi, . . . ,X n , 2)i, . . . ,2J„ C* -algebras, X, = Cf,, 21 = 3£i®- • • £g) X„ <§5 2) i ® ■ ■ -®?)rc- a^d a probability 
distribution P on X\ X • • • X Af n . TTien wzt/i t/ie state 

7= 2J ^(^li • • • ) x n)x\ ® • • • <8> ar n ® W Xl ® • • • ® W Xn 

on 21 (where P is a probability on X\ x ■ • • x ,Y„ and W maps the Xi to states on 

7(Xi ■ ■ • X„ A 2)i • ■ • 2)„) < £ I(X fe A 2) fe ) 

it 

Proof. We only have to check the conditional independence, which is left to the reader. □ 

We note another simple estimate for the mutual information: 
Theorem 9 For compatible *-subalgebras X,2J: 7(X A 2J) < 2 mm{77 (X), 17(2))} 

Proof. Put together the formula 7(X A 2J) = 77(X) - 77(X|2)) and the simple estimate 77(X|2J) > -77(X) from 
theorem ||. □ 

C. Conditional entropy 

Theorem 10 Let ip : X — > 21, tp : %) —> %l compatible quantum operations with X or 2) commutative. Then H(ip\ip) > 0. 
Proof. Let c = ((pip)*p, then by definition and lemma flV|.[i| 

7%|V>) = 77(a) -77(Tr ajo-) 

Pzrst case: X is commutative, so we can write cr = Q{x)[x] <g> with a distribution Q on and states t x on 2). 
Obviously 77(o") - 77(Q) + J2 X Q{x)H{t x ), and Trgjo- = Y. x Q(x)[x] = Q, and hence 7%|<0) = Q{x)H{t x ) > 0. 
Second case: 2) is commutative, so we can write a = Q(x)[x]t x <g) [x], like in the first case. H(o~) is calculated as 
before, but now Tr jjcr = J2 X Q{ x ) T x = Qt, and 

7%|V0 = 77(Q) - (77(Qr) - £ Q(z)77(r x )) 

= H(Q)-I(Q,t) >0 

(see section VIII , for the last line theorem VIII . l]) . □ 



Note 11 From the proof we see that the commutativity of X or 2) enters in the representation of a as a particular 
separable state with respect to the subalgebras X, 2) (see definition below), namely with one party admitting common 
diagonalization of her states. We formulate as a conjecture the more general: 
77(X|2J) > if p is separable with respect to X and 2). 

From this it would follow that in this case 7(XA2)) < min{i7(X), i7(2))} (see theorem^), which we now only get from 
the commutativity assumption. 
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Definition 12 Call p separable with respect to the compatible *-subalgebras Xi,... , X m of 2t, if, for the natural 
multiplication map p : X\ ® ■ ■ ■ ® X m — > 21, is a separable state on X± (£) ■ ■ ■ <£> X m , i.e. a convex combination of 
product states a\ ® • •• eg) cr m , a, € <S(3E,-). J/ /i*p is a product state, we call also p a product state with respect to 
Xi , . . . , X m . 

Theorem 13 (Knowledge decreases uncertainty) Let tp : X —> fy, ip : %) ^ compatible quantum operations, 
and ip' : X' — > X any quantum operation. Then H(ip\<p) < H(iJ)\(p o <^') ; m particular H(ip\ip) < H{ip). 

Proof. The inequality is obviously equivalent to I(^j> /\ tp) > I(xp A <p o ip'), i.e. theorem [j[ □ 
Defining h(x) = — x\ogx — (1 — x) log(l — x) for a; € [0, 1] we have the famous 

Theorem 14 (Fano inequality) Let p a state on 21, and 2J be a *-subalgebra o/2l, compatible with the observable X 
(indexed by X ). Then for any observable Y with values in 2) the probability that "X ^ Y", i.e. P e = 1— ■ Tr (pXj-Y^-), 
satisfies 

#(X|2J) < &(P e )+P ]og(|#|-l) 

Proof. By the previous theorem [l^ it suffices to prove the inequality with H(X\Y) instead of H{X\?Q). But then we 
have the classical Fano inequality: the uncertainty on X given Y may be estimated by the uncertainty of the event 
that they are equal plus the uncertainty on the value of X if they are not. □ 

Corollary 15 Let X a commutative *-subalgebra compatible with 2), and X the (uniquely determined) maximal 
observable on X, P e as in the theorem, then 

ff(£|2J) < h(P e ) + P e log(Trsupp (p\ x ) - 1) 

Proof. First observe that ff(X|2J) = H{X\%)). To apply the theorem we only have to restrict the range of X to those 
values that are actually assumed. □ 

VIII. QUANTUM CHANNELS 
A. General remarks 

We consider in the following only quantum channels with a priori fixed input states (i.e. classical-quantum channels 
after Holevo ||). Formally such a system may be described by the collection (W^x e X) of states with W x appearing 
at the receiving end when x is sent. This may also be described by its linear extension W : CX* — * 2J», a trace 
preserving quantum operation (this is the only occasion where we omit the subscript * for a quantum map between 
state spaces). 

Side remark: the most general quantum channel appears if we allow at the left any C*-algebra instead of the 
commutative one. In this case we are free to choose input states — in general from a continuum. Even more, we may 
(in block coding) use entangled states. For simplicity, and because of some unsolved problems in the more general 
case we decided here to stay with classical-quantum channels. 

This idea of a channel as a process, after choosing a distribution P on X (i.e. a state on X = CX), which is an 
average input, leads to the notions of the average output PW = W(P) = ^2 xeX P(x)W x and the mutual information 
/■:/'. II: //.:Hli V, v />:;,.;//,: IF,;. 

Whereas this is a physically perfectly reasonable model with its appropriate ideas, looking at classical information 
theory we see that there is also another way of thinking about channels: namely as stochastic two-end systems, one 
end of which is declared the sender, the other the receiver (even though formally the thing is symmetric), and their 
respective input and output distributions are marginals of some joint distribution (which reflects the dependence of 
the output on the input). To model this with quantum systems define the channel state 7 = ^2 X P{x)x ® W x on 
X ® 2J- Notice that we (abstractly, and somewhat unnaturally) divided the system into two: its past and its future, 
and 7 describes the correlation between them. Obviously P and PW are obtained as marginals, by tracing over 2), 
X, respectively. In fact it is an easy exercise to verify that L(P, W) = I(X A 2J). 

This second point of view (and its connection to the first, which was noticed before by Hall |28| in his investigation 
of what he calls context mappings) was the motive for the whole presentation in the preceding sections: to phrase 
the information and entropy concepts initially defined in the context of processing states via quantum operations in a 
"static" model that allows for the use of observables (i.e. random variables), and comparison with certain subalgebras. 
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B. Multiway channels 



In the sequel we will also consider a more general channel: we call it the (all-to-all) quantum multiway channel 
with s senders and r receivers (or the ) — fold compound multiple access channel), and it consists of s commutative 
C*-algebras 3ti, . . . ,X S (say Xi = CXi, and let X = Xi<8> • • -®X S ), a quantum operation W : X — > 2), and r compatible 
*-subalgebras 2)i, . . . ,2} r of 2). The idea here is that the 3£, are the senders, the 2}j the receivers, and each sender 
wants to send the same message to every receiver, with small error probability. This we formalize in the notion of an 
(n, e)-code which consists of s mappings fa : Aii — > X™ with finite set Aii, and r decoding observables Yj, indexed 
by Ai[ x • • • x A4' s D Aii x • • • x Ai s , with values in 2J®" (and so these are automatically compatible) with the r 
(average) error probabilities 

',■/: /-V f ; 1 1 £ Tr (W® n (f(mi),... ,/(m s ))^, mi ... m .) 

all being at most e. The raie of the code is the tuple (Ri, ... , R s ) with Ri = — log |.Mi|. The problem is then to 

n 

determine the capacity regions R(e), i.e. the set of all achievable s-tuples with error probability e (where achievable 
means that for infinitely many n there exist (n, e)-codes with rate tuples converging to the given tuple) , or more 
realistically R = P| f>Q R(e) (which is usually called the capacity region). Obviously this consists of two parts: first 
to exhibit the existence of codes with certain rate, second bounds on the rate for any code. 

A little history: with classical communication the multiway channel was first considered by Shannon p9[ , and the 
exact determination of the capacity region was done by Ahlswede |30| , |3l|] . There are of course even more general 
multi-user communication models, most of which are unsolved: a good overview is in the paper |j2"| by El Gamal 
and Cover. Quantum channels for single sender and receiver were all around since the sixties, but the first formal 
definitions seem to have been given by Holevo |^,^]. The quantum multiway channel as defined here is a slightly 
smoothed presentation of the definition by Allahverdyan and Saakian |33| (where the channel is a general quantum 
map). 

Before we can tackle this problem (of which we will solve in this paper only the second part, giving bounds) we have 
to collect a few facts. 

The following is a corollary to the information inequality: 

Theorem 1 (Holevo bound) For any classical-quantum channel W : X — > 6(2)), any distribution P on X , and 
any observable Y on 2) 

I{P, W) > I(P, Y* o W) 

More generally, for any completely positive quantum operation ip : 3 - ► 2) one has I(P,W) > I(P,ip* o W). In 
particular I \P,W) < I \P, id) = H(P). 

Proof. All ingredients are already known: we define a channel state 7 = ^2 X P(x)[x] <8> [x] on CX £g> CX and observe 
that I(P, id) = /(idi A id2) = H(P). Now to apply the information bound let tp : 2) — > CX such that W — ip*: 

J(idi A id 2 ) > 7(idi A tp) > J(idi Aipotp) 

II II II 

H{P) I(P, W) I(P, <p* o W) 

□ 

The formulation of the Holevo bound is of course in the manner of a data processing inequality, data processing in 
the sense of composition of two quantum operations. We can also formulate it in the language of observables, just 
like for classical correlated random variables: 
For this consider the following state on X £g) 2) ® 3 

which represents the correlation of the three stages of the system: preparation, reception, and detection of the signal 
(again note that this is artificial in the material sense). The data processing inequality now is in the familiar form 
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I(X A 3) < I(3i A 2J). For proof check identity of the information terms with those in the Holevo bound. 
We might want to try to imitate the well known classical proof for random variables: by obvious chain rules 

7(XA2}3) =/(XA2)|3)+/(XA3) 
= /(XA3|?))+/(XA2)) 

Since /(X A 2)|3) > the inequality will be proved if we could show that /(X A 3|?J) = 0: but this is not true, as we 
will show immediately by example! Before we do that however let us discuss our definition of 7. Observe that it not 
even in the classical case reflects the dependence of 3 on 2) correctly: W x is a sum of pure (deterministic) states, say 
W x = ^2 y W(y\x)V y (classically of course this is unique, and V y is just y), and 93, invidually transforms these states. 
Thus a better choice whould be 

7 = J2 P(x)W(y\x)x ®V y ® ip*(V y ) 
x,y 

Note that this does not change /(X A 3) or /(X A 2J). On the other hand the decomposition of W x is no longer unique 
in the quantum case. In our example however the W x will be pure, so there is in fact no question of decomposition: 

Example 2 Consider a binary channel, i.e. X = {0, 1}, 2J = £(C 2 ). In C 2 fix an orthonormal basis |0), |1) and let 

|+) = 4=(l°) + I 1 ))- Lct w o = |0> <0j , W x = l+X+l, and P the uniform distribution. 
v2 

In the first scenario lct f r = id, so 

7 = ho] ® |o)(o| ® |o)(o| + ® i+x+i ® i+x+i 



and a short calculation shows 



if (X|2J) = 1 - /i(cos 2 f ) ss .399 

6 - 



i?(X|2)3) = 1 - Mcos 2 f ) « -189 



The difference is easily explained: in the second quantity one has access to two clones of the original state W x , so 
identifying x is better possible. 

This principle of doubly using quantum information in a forbidden way still is possible even if we insist that (p should 
be a measurement: in the second scenario tp* is the external operation of a von Neumann measurement in basis 



7T 7T 7T 7T 

\u) = cos — 10) — sin — 11), \v) = sin — 10) + cos — 11) 



Thus (with a = cos 



2 tt = l + y/1/2 
8 2 ; 



i[0] ® W ® (a[0] + (1 - a)[l]) + ^[1] ® ® ((1 - a)[0] + a[l]) 



^ 1 1 +■ >i r vv, fl 1 + y/1 - Ml - a) 1 + ^3/4 . 
and an easy calculation shows (with p = = ) 

i/(X|2J) = 1 - h(a) w .399 
F(X|2J3) = 1 - « -246 

Again the reason for the failure is the same (which is unknown in the classical theory): in 7 we consider states as 
coexistent which never can coexist, because the third stage evolves from the second by an operation (a measurement) 
which must needs disturb the system: we neglected this very fact in constructing 7, and we had to: otherwise we 
could not have incorporated both stages of the evolution, the one after W, and the one after ip t . 

After this digression we turn to an application of the Holevo bound: with the above notation 

Theorem 3 (Upper capacity bounds) The capacity region of the quantum multiway channel is contained in the 
closure of all nonnegative (i?i, . . . , R s ) satisfying 

VJ c [s] , j e [r] Y, Ri ^H ^ W) A %jW c )) 
ie.J u 

for some channel states -f u (belonging to appropriate input distributions) and q u > 0, ^2 u q u = 1- 
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Proof. Assume an (n, e)-code (ft,... ,f s ,Yt,.-. ,Y r ). Then the uniform distribution on the codewords induces a 
channel state 7 on (X% ■ ■ ■ X s 2))®". Its restriction to the w-th copy in this tensor power will be denoted 7„. Let 
j 6 [r] , J C [s] . By Fano inequality VII l|| (and corollary) we have 



#(X® n (J)|2jf™X®™(J c )) < l + e-niJ(J) 



With 



H(X® n (J)\^f n X® n (J c )) = H(X® n (J)) - I(X® n (J) A 2)f ™X®"(J C )) 



we conclude (with theorem |VII| .[7| and corollary) 



= nR(J) - I(3L® n (J) A 2)f "X®"(J C )) 



1 1 



(1 - e)i?(J) < - + -7 7 (X»"(J) A 2jf ™X^"(J C )) 
n n J 

1 1 - 

< - + -^/ 7 „(X(J)A2) J X(J c )) 

u— 1 



Note 4 In £/ie case 0/ classical channels the region described in the theorem is the exact capacity region (i.e. all the 
rates there are achievable), as was first proved by Ahlswede 31 1. 



Note 5 The significance of the Holevo bound lies in that we can with it and the Fano inequality derive an upper bound 
on the capacity of a quantum channel. Holevo Qj and independently Schumacher and Westmoreland Ml recently showed 
that in the case 2) = £(TL) this bound can be achieved. In W$ achievability in the case of the multiple access channel 
(r = 1 ) and for general 2) is demonstrated. 

We conjecture that also in the general case of r > 1 the theorem gives already the right capacity region. 



C. Broadcast channels 



To end this section let us think a bit about the quantum analog of the broadcast channel (see also recent work 
by Allahverdyan and Saakian [|||): suppose a sender wants to transmit messages from two sets to two receivers — 
over the same quantum channel (like a TV-station with several programs). Receiver 1 is interested in part 1 of the 
message, receiver 2 in part 2, both in a common part 0. A model of this situation is a map W : X — > 6(2)) for the 
channel, two *-subalgebras 2)i,2)2 of 2) for the two receivers: the triple (W, 2)1,2)2) we call a broadcast channel. 
If these subalgebras are compatible we call the system plug-and-play (because then each receiver may choose any 
observable without interfering with the other. In the other case they may have to agree on compatible observables, 
or prescribe the order of access to the data). An n-block code for this channel is a triple (f,Dt,D%) with a map 
/ : Mo x Mi x M2 — * X n and compatible observables A in 2J i; indexed by M' Q x M\ D Mo x M t (i = 1, 2). The 
(maximum) error probability of the code is 

e(f,Di,D 2 ) = max{l - Tr (Wf( mo , mi}m2 - ) Di tmomi D 2 , m „m 2 )\m l € M t , i = 0, 1,2} 

(and analogously the average error probability e). If it is at most e we speak of a (n, e)-code ((n, e)-code, respectively). 

The capacity of the code is, as expected, the triple (R\, Ro, R2) — (— log \Mt\, — log |A^o|, — log l-Mal); and the 
problem is to determine the capacity region. This is a problem exceedingly difficult, not even solved completely in 
the classical case (2) commutative). 

Thus we may consider a restricted situation, which has in the classical case a complete solution: the degraded broadcast 
channel: here the line to receiver 2 "factors" through 1, i.e. the degraded broadcast channel is a triple (W, </?*, 2)i) 
with W as above, and a quantum operation ip* : 2J* — > 2)2*. Receiver 1 is the *-subalgebra 2Ji of 2), receiver 2 the 
algebra 2)2- This links with the previous explanation via the definitions W2 = ° W, Wt = i*°W (for the inclusion 
2 : 2) 1 c — ► 2J). This however gives not the correct picture because this model is manifestly not plug-and-play: The 
second receiver has to take what the first left to him. Formally: an n-block code (f,D 1: D 2 ) now consists of / as 
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before, and also the observable D 2 on 2)f ™, and a subtle modification of Di to the operation D\ = -0, for a quantum 
operation 

to ® mi ® ^4 i-> £^ omi AE momi 

with E momi £ 2}f ™. Obviously Txmigm o D\ = D\ for the observable Di indexed by A1q x and consisting of the 
operators Di^ momi — E momi E* m . With this we can formulate the error probability: 

e(f,Dx,D 2 ) = max{l - Tr (E^ Qmi W f{m0:mi . m2 )E momi D 2 ^ mom2 )\m t £ Mi, i = 0,1,2} 

(analogous for the average error probability). In direct analogy with the classical situation we present the following 

Conjecture 6 For 2}i = 2) the rate region is the convex hull of the triples (Ri, Rq, R2) with 

Ri <I{V(-\u),W\Q) 
R0 + R2 <I(Q,<p*oWoV) 
R 1 + R a + R 2 <I(QV,W) 

where Q is a distribution on a finite setU, and V a classical channel from hi to X. 



D. Open problems 



Note 7 Meaning of theorem |VTj| .pl f or coding theorems: The reason why for truely quantum channels one has strict 
inequality is that we cannot detect the W x optimally in one common basis (for simplicity assume that we only employ 
von Neumann measurements). Assume we chose an eigenbasis of PW , then we "see" correctly the entropy H(PW) 
of the output state, but for the letter states we introduce some additional entropy to their H(W X ). Thus we get to low 
a mutual information because our measurements introduce noise. We want this noise increase to be small by choosing 
codewords appropriately, and then "approximating" with a von Neumann observable, all the codeword states nearly 
commute with. The problem here is to do this such that the von Neumann mutual information remains the same. 
Note that this is a different approach to coding than those used so far: there we directly construct codes approaching 
certain rate, using general observables. Here we would have a von Neumann observable approaching the Holevo bound, 
i.e. a classical channel for which we may construct codes by the known classical techniques. 



Note 8 For classical-quantum channels there does not appear to exist a reasonable notion of transpose channel. If 
however we see a channel as a quantum map from any one system to another, then given an input state one can define 
formally a transpose channel under certain circumstances, see |J^/. This goes the opposite direction as the original 
channel, so in our case we get a measurement operation. It is to be explored whether this notion gives us new insight 
in the communication problem. In particular we may relate the classical-quantum channels with quantum-classical 
channels (i.e. fixed measurements, or if variable only product measurements). Maybe we can even prove that coding 
classical information with entangled states in quantum-quantum channels yields higher capacities... 
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